[Hexagon][UnitTest] Disable flaky quantization test#16337
Merged
junrushao merged 2 commits intoapache:mainfrom Jan 3, 2024
Merged
[Hexagon][UnitTest] Disable flaky quantization test#16337junrushao merged 2 commits intoapache:mainfrom
junrushao merged 2 commits intoapache:mainfrom
Conversation
The `test_pass_fq2i_avg_pool2d.py::test_avgpool_conv2d` test is sensitive to rounding errors, and failed about a third of the time (42 / 100 tests). This was first noticed as CI failures in unrelated PRs (e.g. https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-hexagon/detail/PR-16184/6/tests). This commit marks the flaky portions of the test with `pytest.mark.xfail`, to avoid causing breaking CI for other PRs. To minimize the extent of the disabled test cases, this commit breaks up each of the unit tests. Where previously a single test performed both hardware/simulation tests and relay graph comparisons, these are now done in separate test functions. The hardware/simulation tests use `tvm.testing.assert_allclose` and have a tolerance of `1e-02`, while the graph-comparison tests use `tvm.ir.structural_equal`, and require identical floating-point values. Only the graph-comparison test is disabled here. The other two test cases in `test_pass_fq2i_avg_pool2d.py` do not show this same sensitivity, with no failures seen in 100 executions.
Contributor
Author
|
@rasagna-quic Can you take a look at this test case? It was introduced in #15599, and is failing about 1/3 of the time. This PR is a stopgap to avoid impacting other work, but it should have a better fix for the long-term. |
Contributor
|
@Lunderberg Thank you for these changes, these look good to me. I will work create a new PR to fix this issue.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
test_pass_fq2i_avg_pool2d.py::test_avgpool_conv2dtest is sensitive to rounding errors, and failed about a third of the time (42 / 100 tests). This was first noticed as CI failures in unrelated PRs (e.g. https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-hexagon/detail/PR-16184/6/tests). This commit marks the flaky portions of the test withpytest.mark.xfail, to avoid causing breaking CI for other PRs.To minimize the extent of the disabled test cases, this commit breaks up each of the unit tests. Where previously a single test performed both hardware/simulation tests and relay graph comparisons, these are now done in separate test functions. The hardware/simulation tests use
tvm.testing.assert_allcloseand have a tolerance of1e-02, while the graph-comparison tests usetvm.ir.structural_equal, and require identical floating-point values. Only the graph-comparison test is disabled here.The other two test cases in
test_pass_fq2i_avg_pool2d.pydo not show this same sensitivity, with no failures seen in 100 executions.